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Abstract 

Background: Perennial ryegrass {Lolium perenne L.) is one of the most important forage and turf grass species of 
temperate regions worldwide. Its mitochondrial genome is inherited maternally and contains genes that can 
influence traits of agricultural importance. Moreover, the DNA sequence of mitochondrial genomes has been 
established and compared for a large number of species in order to characterize evolutionary relationships. 
Therefore, it is crucial to understand the organization of the mitochondrial genome and how it varies between and 
within species. Here, we report the first de novo assembly and annotation of the complete mitochondrial genome 
from perennial ryegrass. 

Results: Intact mitochondria from perennial ryegrass leaves were isolated and used for mtDNA extraction. The 
mitochondrial genome was sequenced to a 167-fold coverage using the Roche 454 GS-FLX Titanium platform, and 
assembled into a circular master molecule of 678,580 bp. A total of 34 proteins, 14 tRNAs and 3 rRNAs are encoded 
by the mitochondrial genome, giving a total gene space of 48,723 bp (7.2%). Moreover, we identified 149 open 
reading frames larger than 300 bp and covering 67,410 bp (9.93%), 250 SSRs, 29 tandem repeats, 5 pairs of large 
repeats, and 96 pairs of short inverted repeats. The genes encoding subunits of the respiratory complexes - nad] 
to nad9, cob, coxl to cox3 and atpl to atp9 - all showed high expression levels both in absolute numbers and after 
normalization. 

Conclusions: The circular master molecule of the mitochondrial genome from perennial ryegrass presented here 
constitutes an important tool for future attempts to compare mitochondrial genomes within and between grass 
species. Our results also demonstrate that mitochondria of perennial ryegrass contain genes crucial for energy 
production that are well conserved in the mitochondrial genome of monocotyledonous species. The expression 
analysis gave us first insights into the transcriptome of these mitochondrial genes in perennial ryegrass. 

Keywords: De novo assembly, Mitochondrial gene expression, Mitochondrial genome, Next-generation sequencing, 
Perennial ryegrass {Lolium perenne L.) 



Background 

Mitochondria are semi-autonomous organelles in eukary- 
otes. Their primary function is the production of meta- 
bolic intermediates and cellular ATP through the citric 
acid cycle and oxidative phosphorylation pathway. For this 
reason, mitochondria are involved in a wide variety of cel- 
lular and developmental processes including pollen devel- 
opment and cytoplasmic male sterility (CMS) [1,2]. 
Mitochondria have their own genomes, which harbor 
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genes for ribosomal RNAs (rRNAs), transfer RNAs 
(tRNAs) and subunits of the respiratory complexes. Exten- 
sive research has been performed to understand orga- 
nization and function of mitochondrial genomes. To date 
(September 30, 2012), more than 70 plant mitochondrial 
genomes have been sequenced, including those of 22 
seed plant species (http://www.ncbi.nlm.nih.gov/genomes/ 
GenomesGroup.cgi?taxid=33090&opt=organelle), and of a 
large number of protists, algae, fungi, and animals. These 
studies have greatly improved our understanding of mito- 
chondrial gene content, genome size and organization, 
mutation rate as well as gene shuffling events. The 
sequenced mitochondrial genomes exhibit significant 
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variation in both size and actual gene content, despite the 
universally conserved sequence that exists between the 
mitochondrial genomes of diverse species [3]. The size of 
sequenced plant mitochondrial genomes varies more than 
12-fold among angiosperms, ranging from 208 kbp in 
white mustard (Bmssica hitra) [4] to over 2,700 kbp in 
muskmelon (Cucumis melo) [5], while the number of 
genes only varies between 50 and 69 including 30 to 37 
protein-coding genes [6]. The significant variation in size 
of the mitochondrial genome between species is explained 
by expansion of the inter-genic regions, structural re- 
arrangements and intra- or intermolecular recombination 
events [7]. Gene shuffling events in higher plant mito- 
chondrial genomes have occurred due to the presence of 
repeated sequences [8]. In combination with sequence du- 
plication events, this has resulted in a unique diversity of 
plant mitochondrial genomes [9]. Therefore, the DNA se- 
quence of plant mitochondria has become an important 
tool in phylogenetics for comparison of the evolutionary 
relationships among species. In addition, sequencing of 
the mitochondrial genome has the potential to increase 
our understanding of the complex genetic interactions be- 
tween the nuclear and the organellar genomes. 

Perennial ryegrass (Lolium perenne L.) is a diploid 
(2n = 2x = 14) member of the Poaceae family and one 
of the most important forage and turf grass species of 
temperate regions worldwide [10]. Its economic import- 
ance has led to the establishment of high-density genetic 
maps as well as genome and transcriptome sequence re- 
sources. For example, the complete chloroplast genome 
sequence has recently been published [11], and assembly 
of the genome sequence is currently being progressed 
[12]. However, the complete mitochondrial genome se- 
quence of perennial ryegrass as well as of any other for- 
age and turf grass species was hitherto unknown. 

Therefore, the main objective of this study was to se- 
quence, assemble and annotate the perennial ryegrass 
mitochondrial genome. Specifically, we aimed at (i) de- 
scribing the organization of the perennial ryegrass mito- 
chondrial genome for future comparative analyses of 
mitochondrial genomes within Lolium and between 
closely related grass species, (ii) identifying protein-coding 
genes, rRNA genes, tRNA genes and open reading frames 
(ORFs) to understand the function the mitochondrial gen- 
ome, and (iii) gaining first insights into the mitochondrial 
transcriptome of perennial ryegrass. 

Results 

Isolation of intact mitochondria and extraction of mtDNA 

A cellular fraction containing crude mitochondria were 
isolated from perennial ryegrass leaf tissue by homo- 
genization followed by differential centrifugation. Fur- 
ther attempts to purify the mitochondria by Percoll 
density gradient centrifugation failed. The crude 



mitochondrial fraction was characterized by measuring 
the activity and latency of cytochrome c oxidase (CCO) 
as a marker enzyme for the intactness of the inner 
mitochondrial membrane, and malate dehydrogenase 
(MDH), an enzyme residing in the mitochondrial matrix 
as well as in the cytosol and several other places in the 
cell [13] (Table 1). The large increase in specific CCO ac- 
tivity indicates that there was a 7.7-fold enrichment of 
mitochondria from the homogenate to the crude mito- 
chondrial fraction as expected. The latency of CCO mea- 
sures the ability of the substrate, reduced cytochrome c, to 
reach the active site of the enzyme on the outer surface of 
the inner membrane. The high CCO latency in both hom- 
ogenate and crude mitochondria indicates that the outer 
membrane was mainly intact [14,15]. Only a small fraction 
of MDH activity co-purified with the mitochondria, but its 
latency increased dramatically (3.5-fold) indicating that 
the major part of the lost MDH had been present outside 
the mitochondria and that the crude mitochondria contain 
most of its MDH behind the permeability barrier of an in- 
tact inner membrane [16]. Thus, the crude mitochondrial 
fraction contained mainly intact mitochondria (90%), in 
which the mtDNA was protected inside intact outer and 
inner membranes (Table 1). 

Subsequently, contaminating nuclear and chloroplast 
DNA was removed by treating the crude mitochondrial 
fraction with DNAse. The isolation of mtDNA from in- 
tact mitochondria from 60 g (two batches of 30 g) fresh 
weight leaves resulted in 3.5 ug mtDNA. 

Sequencing and assembly of the perennial ryegrass 
mitochondrial genome 

A total of 287,367 single reads were generated with a 
mean length of 403 bp (approximately 116 Mbp of se- 
quence information) from the mitochondrial genome of 
the perennial ryegrass genotype Fl-30 using Roche 454 
GS-FLX Titanium sequencing (Table 2). This resulted in 
a 167-fold coverage of the mitochondrial genome. The 
contaminating chloroplast sequence reads were removed 
by performing a reference assembly against the chloro- 
plast genome (GenBank Acc. No.: NC_009950.1). The 
isolated mitochondrial DNA was contaminated with ap- 
proximately 2% chloroplast DNA (Table 2). 

The initial assembly generated 2,403 contigs totaling 
1.7 Mbp with an average length of 703 nucleotides. The 
longest contig was 219,170 bp, and the shortest contig 
was 116 bp. A BLASTn search was performed against 
the nucleotide collection of NCBI aiming to remove 
contaminating contigs. Nine out of 2,403 contigs were 
identified as plant mitochondrial DNA sequences with a 
mean length of 80,314 bp (total size 722,827 bp). The 
remaining 2,394 contigs corresponding to 0.83% of the 
116 Mbp total sequence information were contaminating 
sequences which was discarded. These contigs were 
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Table 1 Characterization of mitochondrial enrichment and intactness 



Parameters *Fraction 







Homogenate 


Crude mitochondria 


Protein 


Total (mg) 


731 ±40 


42.0 ± 7 


CC0 activity 


Total (|jmol min" ) 


13.0 ± 2.90 


5.87 ± 1 .41 (45%) 




Specific (nmol min" 1 mg" 1 ) 


1 8.0 ± 2.83 


1 39 ± 9.90 (7.7-fold) 




Latency (%) 


90.0 ±1.41 


89.5 ± 0.71 


MDH activity 


Total (pmol min" 1 ) 


214 zt 61 .5 


26.5 ± 0.71 (12%) 




Specific (nmol min" 1 mg" 1 ) 


291 ± 68.6 


639 ± 91.2 (2.2-fold) 




Latency (%) 


22.0 ±1.41 


77.5 ± 0.71 (3.5-fold) 


Yield of mtDNA 


Total (pg) 




3.47 ± 0.27 



•Data are means ± SD of two separate isolations each of 30 g fresh leaves. 



mainly single read sequences related to other organisms 
(Table 2). These nine contigs of the initial assembly 
could not be further arranged into a single circular mol- 
ecule mainly for two reasons. Firstly, there were cases of 
misassembled contigs, and secondly there were cases 
where repetitive sequences led to a breakdown of the as- 
sembly process. In order to further resolve the arrange- 
ment of the nine contigs, we made use of the sequence 
information available from a perennial ryegrass nuclear 
genome sequencing project that is ongoing in our lab, 
which contained assembled contigs and scaffolds origin- 
ating from the mitochondrial genome. This assembly in- 
cluded mate-pair Illumina libraries with insert sizes of up 
to 9 kbp, which helped us to predict the order and orienta- 
tion of a number of the nine mitochondrial contigs. This 
was then followed by a process of designing primers to 
span contigs, followed by Sanger sequencing to fill in the 
gaps and, ultimately, merge the contigs (Figure 1). The 
complete nucleotide sequence of the mitochondrial gen- 
ome of perennial ryegrass has been deposited to GenBank 
under the accession number JX 999996. 



Features of the perennial ryegrass mitochondrial genome 

The genome size was 678,580 bp with a G+C content of 
44.1%. Annotation of the mitochondrial genome was 
performed and a total of 73 genes including protein- 
coding genes, rRNA genes, tRNA genes as well as 149 



ORFs were identified. These regions account for 21.03% 
of the genome (Figure 1, Table 3, Table 4 and Additional 
file 1: Table SI). 



Protein-coding genes 

The perennial ryegrass mitochondrial genome contains 
39 genes encoding 34 different proteins including one 
pseudogene, rps4-p (Table 4). The genes encode 19 pro- 
teins of the electron transport chain. They include nine 
subunits of complex I: NADH dehydrogenase subunits 
1, 2, 3, 4, 4L, 5, 6, 7 and 9 (nadl, 2, 3, 4, 4L, 5, 6, 7 and 
9) of which nadl has two copies; one subunit of com- 
plex III: apocytochrome b {cob); three subunits of com- 
plex IV: cytochrome c oxidase subunits 1, 2 and 3 (coxl, 
1 and 3); five subunits of complex V: ATP synthase Fl 
subunits 1, 4, 6, 8 and 9 (atpl, 4, 6, 8 and 9). No genes 
were found to encode subunits of complex II: succinate 
dehydrogenase subunits 3 and 4 (sdh?> and sdhi). In 
addition, four genes encode proteins involved in cyto- 
chrome c biogenesis: subunits B, C and F (ccmB, C, FN 
and FC). Two genes, matR and mttB, encode maturase 
and transport membrane proteins, respectively. The thir- 
teen genes rpsl, rps2, rpsi, rps% rpsl-1, rpsl-2, rpsYl, 
rpsYi, rpsli-l and r/«14-2, and rpl5-l, rpVS-1 and rpl\6 
encode ribosomal proteins. In total, the protein-coding 
regions cover 5.23% of the mitochondrial genome 
(Figure 1, Table 3 and Table 4). 



Table 2 Summary of the perennial ryegrass mitochondrial genome sequencing and assembly 



Parameters 



Counts 



Average length (nucleotides per read or contig) 



Total bases 



Total number of single reads 287,367 

Total number of single reads after quality control* 281,706 

Number of initial contigs 2,403 

Number of mitochondrial contigs 9 

Number of final contigs after closing all gaps: Genome size 1 
Coverage 



403 
403 
703 
30,314 



1 1 5,768,584 
113,583,420 
1,689,209 
722,827 
678,580 
167-fold 



•Removal of reads matching chloroplast DNA. 



Islam et al. BMC Genomics 2013, 14:202 
http://www.biomedcentral.com/1471-2164/14/202 



Page 4 of 21 



nad9 cox 3 cox 1 
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rrn5-2. 
rrnl8-2. 




rrn26-l trn D-2atp4 



Figure 1 Map of the perennial ryegrass mitochondrial genome. Protein, tRNA and rRNA-coding genes are shown inside and outside the 
circles. The second outer circle represents the circular master molecule. Genes and exons are indicated by arrowheads, p indicates a pseudogene. 
The forward and reverse DNA strands are shown in clockwise and anticlockwise orientation, respectively. The middle black peaked circle 
represents the G+C content of the master molecule. The inner circle shows the size markers in kbp in clockwise orientation. The first nucleotide 
of the cox! gene is the starting point of the circular master molecule. This figure was generated using the CGView Server [1 7]. 



RNA-coding genes 

The perennial ryegrass mitochondrial genome contains a 
total of 34 RNA-coding genes: three rRNA genes 
(present in two copies each) for the ribosomal subunits 
18 S, 26 S and 5 S, and 28 tRNA genes including one 
pseudogene, tRNA-Phe GAA (Table 4). The anticodon of 
28 tRNA genes match the codons of a total of 14 amino 
acids. The RNA-coding genes represent 1.95% of the 
mitochondrial genome (Figure 1, Table 3). The length of 
the rRNA genes range from 122 to 3,461 nucleotides, 
and the tRNA genes range from 71 to 88 nucleotides 



(Table 4). No tRNA genes were found for the amino 
acids Alanine (Ala), Arginine (Arg), Glysine (Gly), Iso- 
leucine (He) Threonine (Thr), and Valine (Val) in the 
perennial ryegrass mitochondrial genome (Table 5). 

Introns 

Among the 39 protein-coding genes, only nine (nadl, 
nadl, nad% nad5, nad7-l, nadl-1, coxl, ccmFC and 
rps3) contain introns. All the introns in sequenced mito- 
chondrial genomes are classified as group II introns [18]. 
A total of 26 group II introns were found within the 



Islam ef al. BMC Genomics 2013, 14:202 
http://www.biomedcentral.com/1471-2164/14/202 



Page 5 of 21 



Table 3 Main features of the assembled perennial 



ryegrass mitochondrial genome 

Genome size (nt) 678,580 

G+C content (%) 44.1 

Gene number 73 a 

-Protein coding genes 39 (34) b (1 )' 

-tRNA genes 28(14) b (1)' 

-rRNA genes 6 (3) b 
Proteins 

-Protein coding sequences (nt) 35,514 

-Protein coding sequence (%) 5.23 
RNAs 

-RNA gene coding sequences (nt) 13,209 

-RNA gene coding sequences (%) 1 .95 
Introns 

-Cis-spliced group II introns 22 

-Trans-spliced group II introns 4 

-Introns sequences of these genes (nt) 26,631 

-Introns of these genes sequences (%) 3.92 
ORFs 

-No. ofORFs 149 

-ORFs (>300 bp) sequences (nt) 67,410 

-ORFs sequences (%) 9.93 

Total (proteins, RNAs and ORFs) sequences 142,764 

Total (proteins, RNAs and ORFs) sequences (%) 21.03 
Repeats 

-Large repeat sequences (nt) 1 16,177 

-Large repeat sequences (%) 17.12 

Transposable element-related sequences (%) 3.76 



a Total numbers including duplications and pseudogenes, b ln parentheses, 
number of unique genes, c In parentheses, number of pseudogenes. 



nine protein-coding genes including four trans-spliced 
introns that are part of nadl and nadl (Table 4). In total, 
twenty-two cis-spliced introns are present in nadl, nadl, 
nadl, nadS, nadl, cox2, ccmFC and rps3. Among 28 tRNA 
genes, one tRNA gene trnL CAA contains an intron. 

Open reading frames (ORFs) 

An ORF may be defined as an in-frame DNA sequence 
of 300 bp or longer that is bordered by a start and stop 
codon, with no match to a coding sequence in the public 
databases [19]. In the perennial ryegrass mitochondrial 
genome, we found 149 ORFs with a minimum and max- 
imum length of 303 and 2,571 bp, respectively, and with 
a mean length of 452 bp covering 9.93% of the genome 
(Table 3, Additional file 1: Table SI). 



Repetitive regions and their gene content 

A variety of repetitive DNA sequences were found in the 
perennial ryegrass mitochondrial genome. There are four 
pairs of large inverted repeat (IR) sequences, with repeat 
lengths of 50,267, 30,833, 24,985 and 1,534 bp, as well as 
one large directly repeated (DR) sequence of 8,558 bp 
(Figure 2), with 99% sequence identity. Overall, these 
five large repeats account for 17.12% of the mitochon- 
drial genome. The genes, nadl, rpsl, rpsli, rpl5, rrn5, 
rrnl8, rrn26, trnD, trnC, trnQ, trnK, trnbA, trnV, trnS 
and trnW, were found as multiple copies located in the 
large inverted repeat sequences in the perennial ryegrass 
mitochondrial genome (Figure 2, Table 4). 

A total of 96 pairs of short inverted repeat (SIR) se- 
quences were identified covering 4,886 bp (0.72%) of the 
mitochondrial genome. Percent matches between SIRs 
were higher than 68%, with 83 pairs of SIRs showing 
values higher than 80%, while thirteen pairs of SIRs had 
a sequence identity of 100%. The average SIR length 
was 52 bp, and the longest SIR identified was 333 bp 
(Additional file 2: Table S2). 

The mitochondrial genome of perennial ryegrass 
contained 29 tandem repeats, which covered 1,647 bp 
corresponding to 0.24% of the total sequence. The aver- 
age period size and copy number were 26.07 and 2.17, 
respectively. The most common type of tandem repeat 
had period sizes of 42, 14 and 12, which totaled 42% of 
all tandem repeats found in the mitochondrial genome 
(Additional file 3: Table S3). 

Simple sequence repeat sequences 

We found 250 SSRs in the perennial ryegrass mitochon- 
drial genome including 23, 196, 26 and 5 with mono-, 
tri-, tetra- and pentanucleotide repeats, respectively. 
SSRs with dinucleotide repeats were not found. The 
length of the mononucleotide, trinucleotide and 
pentanucleotide repeats range from 10-13, 9-12 and 
15-20 bp, respectively. All the tetranucleotide repeats 
are 12 bp long (Additional file 4: Table S4). Of 196 trinu- 
cleotide repeats, only 14 (7.14%) were present in the 
coding regions in the perennial ryegrass mitochon- 
drial genome. 

Transposable element-related sequences 

The presence of transposable elements (TEs) was also 
investigated using two different TE databases; Poaceae 
and Triticeae as queries from the Genetic Information 
Research Institute (http://www.girinst.org/censor/index. 
php). In total, 22,545 bp (3.32%) and 3,008 bp (0.44%) 
of the total genome sequence showed homology with 
TEs in Poaceae and Triticeae, respectively. The TEs 
were mainly long terminal retrotransposon (LTR) 
elements (Table 6). The circular master molecule 
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Table 4 Genes identified in the perennial ryegrass mitochondrial genome 

Name of Size b (bp) Position in the genome Strand c No. of amino acids 

the 9 enea From ?o~~ 



A. Protein and 
rRNA-coding genes 

1. Complex I genes 



nad] 


978 


- 


- 


X 


325 


nad]s 


387 


284,188 


284,574 


+ 


129 


nad] b 


81 


247,595 


247,675 


- 


27 


nad]c 


192 


245,995 


246,186 


- 


64 


nad\d 


60 


512,368 


512,427 


- 


20 


nad]e 


258 


76,540 


76,797 


- 


85 


nad2 


1,467 


- 


- 


X 


488 


nad2a 


153 


324,853 


325,005 


- 


51 


nad2b 


396 


323,224 


323,619 


- 


132 


nad2c 


159 


588,650 


588,809 


+ 


53 


nad2d 


573 


591,214 


591,786 


+ 


191 


nad2e 


186 


593,194 


593,381 


+ 


61 


nad3 


357 


137,432 


137,788 


- 


118 


nadA 


1,488 


- 


- 


+ 


495 


nadAa 


461 


43,953 


44,413 


+ 


153 


nadAb 


515 


45,442 


45,956 


+ 


172 


nadAc 


423 


48,576 


48,998 


+ 


141 


nadAd 


89 


51,790 


51,878 


+ 


29 


nadAL 


315 


263,099 


263,413 


+ 


104 


nad5 


2,013 


- 


- 


- 


670 


nadSa 


228 


637,049 


637,277 


- 


76 


nadSb 


1,218 


634,967 


636,186 


- 


406 


nadSc 


21 


76,157 


76,177 


- 


/ 


nadSd 


396 


237,378 


237,773 


- 


132 


nad5e 


150 


236,297 


236,446 


- 


49 


nad6 


945 


510,437 


511,381 


- 


314 


nad7- 1 


1,185 


- 


- 


+ 


394 


nad7-\ a 


143 


213,945 


214,087 


+ 


4/ 


nad7-] b 


69 


214,890 


214,958 


+ 


23 


nadl-\ c 


46/ 


216,269 


216,735 


+ 


155 


nadlAd 


244 


217,735 


217,978 


+ 


81 


nad7-1e 


262 


219,684 


219,945 


+ 


86 


nad7-2 


1,185 








394 


nad7-2a 


143 


449,072 


449,214 




4/ 


nad7-2b 


69 


448,201 


448,269 




23 


nad7-2c 


'16/ 


446,424 


446,890 




155 


nad7-2d 


244 


445,181 


445,424 




81 


nad7-2e 


262 


443,214 


443,475 




86 


nad9 


5/3 


669,561 


670,133 


+ 


190 
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Table 4 Genes identified in the perennial ryegrass mitochondrial genome (Continued) 



2. Complex III gene 

cob 1,194 

3. Complex IV genes 

coxl 1,989 

cox2 783 

cox2a 388 

cox2b 395 

coxl 843 

4. Complex V genes 

dtp] 1,530 

atp4 588 

atp6 714 

atp8 468 

ofp9 225 

5. Cytochrome c 
biogenesis genes 

ccmB 651 

ccmC 723 

ccmFN 1 ,770 

ccmFC 1,434 

conFCa 755 

ccmFCb 679 

6. Ribosomal proteins 
genes 

rps] 636 

rps2 1,464 

rps3 1,650 

rps3a 72 

rps3b 1 ,578 

rps4 1,035 

rps4-p 1 ,032 

rpslA 447 

rpsl-2 447 

rps12 378 

rps 13 351 

rps 14-1 159 

rps14-2 159 

rp/5-1 552 

rp/5-2 552 

rp//6 558 

7. Other protein coding genes 

matR 1,977 

mttB 816 



232,540 



484,835 
486,506 
670,403 

312,019 
335,648 
415,086 
292,308 
35,605 



1 32,035 
549,113 
83,571 

519,648 

521,399 



82,801 
575,833 

228,586 
229,590 
93,583 
603,531 
1 77,808 
380,279 
137,010 
248,209 
1 90,863 
472,138 
190,178 
472,430 
231,058 

77,314 
604,582 



233,733 

1,989 

485,222 
486,897 
671,245 

313,548 
336,235 
415,799 
292,775 
35,829 



1 32,685 
549,835 
85,340 

520,402 
522,077 



83,436 
577,296 

228,657 
231,167 
94,617 
604,562 
1 78,254 
380,725 
137,387 
248,559 
191,021 
472,296 
190,729 
472,981 
231,615 

79,290 
605,397 



397 

662 
260 

130 
130 
280 

509 
195 
237 
155 
74 



216 
240 
589 
477 

252 
225 



211 
487 
549 

24 
525 
344 
343 
148 
148 
125 
116 
52 
52 
183 
183 
185 

658 
271 
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Table 4 Genes identified in the perennial ryegrass mitochondrial genome (Continued) 



8. rRNA genes 














1 11 

zz 


qnn cm 1 


501 062 






/7T)5-2 


1 ll 

zz 


jDo,yj J 


C£Q n7£ 


+ 




rrn 1 Q 1 
III) I o- I 


1 OAA 
I ,yoo 


/! QQ Q.fsl 
4yo,OOZ 


cnn qt7 


+ 




rrn 1 Q 1 

rrn \ o-z 


i ,yoo 


jDOjO/ J 


joo,o4 I 


+ 




/7T)26-1 




11^ 7^7 
I I 0,/ J/ 


1 1 Q 11 7 

i i y,/ 1 / 










3/Q on/i 
j4o,yu4 


jjz,jDU 


+ 




Mamp r»f 

the gene a 


Size b (bo) 


Pn^itinn in 
r ujiiiuii ill 


thp npnnmp 

IIIC UCIIUII IC 


*ltrand b tRNA tvnp 


Anti-mHnn 




From 


To 






B. tRNA-coding genes 












tmN 


72 


616,184 


616,255 


Asn 


GTT 


fmD-1 


74 


1 29,749 


1 29,822 


Asp 


GTC 


tmD-2 


74 


338,298 


338,371 


+ Asp 


GTC 


tmD-3 


74 


615,471 


615,544 


+ Asp 


GTC 


tmC-1 


71 


221,721 


221,791 


Cys 


GCA 


tmC-2 


/I 


441,367 


441,437 


+ Cys 


GCA 


rmQ-1 


72 


151,087 


151,158 


+ Gin 


TTG 


tmQ-2 


72 


185,132 


1 85,203 


+ Gin 


TTG 


tmQ-3 


72 


407,374 


407,445 


Gin 


TTG 


tmQ-4 


72 


477,956 


478,027 


Gin 


TTG 


tmE 


73 


649,329 


649,401 


+ Glu 


TTC 


tmH 


/-I 


304,917 


304,990 


+ His 


GTG 


fmL 


71 






Leu 


CAA 


fmLa 


37 


549,042 


549,076 






tmlb 


34 


548,992 


549,029 






tmK-1 


73 


1 74,964 


175,036 


Lys 


TTT 


fmK-2 


73 


383,497 


383,569 


+ Lys 


TTT 


fmfM 


74 


1 1 ,905 


1 1,978 


Met 


CAT 


fmM-1 


74 


615,054 


615,127 


Met 


CAT 


tmM-2 


73 


630,497 


630,569 


+ Met 


CAT 


tmF 


73 


371,486 


371,558 


Phe 


GAA 


tmP-1 


74 


172,177 


172,250 


Pro 


TGG 


fmP-2 


75 


282,933 


283,007 


+ Pro 


TGG 


fmP-3 


74 


386,282 


386,355 


+ Pro 


TGG 


rmS-1 


88 


138,301 


138,388 


Ser 


GCT 


rmS-2 


86 


379,430 


379,515 


+ Ser 


TGA 


fmW-1 


74 


171,961 


172,034 


Trp 


CCA 


tmW-2 


74 


386,498 


386,571 


+ Trp 


CCA 


tmY 


83 


587,933 


588,015 


+ Tyr 


GTA 


fmF-p 


73 


262,935 


263,007 


+ Phe 


GAA 


A total of 39 protein-coding genes and 6 rRNA-coding genes (A) and 28 tRNA-coding genes (B) were identified. a p, pseudogene; b Boldface, sum of all exons; 
lower-case letters, exons of protein-coding genes; hyphenated, duplicate genes; c Plus and minus, encoded by the forward and reverse DNA strand, respectively; 



x, trans-spliced. 
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Table 5 Differences in the tRNA gene content in sequenced mitochondrial genomes of grasses 



tRNA Plant species 



genes 


Lolium 


Ferrocalamus 


Bombuso 


Triticum 


Oryza 


Oryza 


Sorghum 


Zeo 


Zea 




perenne 


rimosivaginus 


oldhamii 


aestivum 


sativa 


rufipogon 


bicolor 


mays 


perennis 


tmA 


- 


- 


- 


- 


- 


- 


- 


- 


- 


trnG 


- 


- 


- 


- 


- 


- 


- 


- 


- 


tmP 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


trnl 


- 


- 


- 


- 


- 


- 


- 


- 


- 


tmV 


- 


- 


- 


- 


- 


- 


- 


- 


- 


tmS 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


tmR 


- 


- 


- 


- 


- 


- 


- 


- 


- 


trnl 


+ 


- 


- 


- 


- 


- 


- 


- 


- 


tmf 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


tmN 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


tmK 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


tmD 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


trnE 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


trnH 


+ 


+ 


+ 




+ 


+ 


+ 


+ 


+ 


tmQ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


trnl 




+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


fmM 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


tmW 


+ 


+ 


+ 


+ 


+ 


+ 


+ 






tmC 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


trrii 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


trnM 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 


+ 



The accession numbers of the mitochondrial genomes are: Ferrocalamus rimosivaginus (JQ235166 to JQ235179), Bambusa oldhamii (EU365401), Triticum aestivum 
(AP008982), Oryza sativa (BA000029), 0. rufipogon (NC_013816), Sorghum bicolor (NC_008360), Zea mays (AY506529) and Z. perennis (NCJD08331). +, present; 
-, absent. 



coordinates of the TEs are presented in Additional 
file 5: Table S5. 

Transcriptome analyses 

We performed an expression analysis of the 39 mito- 
chondrial protein-coding genes (Table 7) using in-house 
RNA-seq data from the reproductive tissue of perennial 
ryegrass inflorescence (unpublished). The results are 
presented both as total number of reads matching the 
genes or as the number of reads corrected for the gene 
length (normalized expression). The most abundantly 
expressed genes in terms of total numbers were cob, 
coxl and atpl, which all had more than 10,000 matching 
reads. However, when comparing normalized expression, 
the most highly expressed genes were nad9, cob, cox3, 
atp9 and rpsYl with more than 10,000 reads per kbp 
gene length. Eleven genes had low expression (<1,000 
matching reads per kbp gene length), namely ccmB, 
rps4:, rpsi-p, rpsl-\, rpsl-1, rpsli-l, rpsli-2, rpl5-l, 
rpl5-2, matR and mttB. Of these, the genes rpsl4-l, 
rpsl4-2, rp/5-1, and rpB-2 had fewer than 100 matching 



reads per kbp gene length. The genes encoding subunits 
of the respiratory complexes {nadl to nad9, cob, coxl to 
cox3 and atpl to atp9) all showed high expression both 
in absolute numbers and after normalization. In con- 
trast, the genes encoding ribosomal proteins varied enor- 
mously in their expression levels (Table 7). 

Discussion 

Intactness of mitochondria 

To obtain mtDNA uncontaminated by nuclear or 
chloroplast DNA, crude but intact mitochondria were 
isolated. Because the mtDNA is located behind two per- 
meability barriers in intact mitochondria - an intact 
inner membrane as monitored by MDH latency [16] 
and an intact outer membrane as monitored by CCO 
latency [14,15] - treatment with DNAse removed con- 
taminating chloroplastic and nuclear DNA without de- 
grading the mtDNA. The fact that only 2% of the 
sequenced contigs were chloroplastic (Table 2) shows 
that this strategy was successful. 
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687,580 




1 Perennial ryegrass mtDNA sequence 678,580 

Figure 2 Dot plot of the perennial ryegrass mitochondrial genome. Five repeats including four large inverted repeat pairs, IR1-IR4 and one 
direct repeat, DR (green) are marked by arrows. The inverted and direct repeat coordinates in the master molecule are: IR1, (178,334-228,600 bp; 
484,825-434,554 bp); IR2, (147,726-178,558 bp; 410,808-379,975 bp); IR3, (106,923-131,907 bp; 361,199-336,212 bp); IR4 (94,022-95,555 bp; 604,126- 
602,594 bp) and DR (498,590-507,147 bp; 566,604-575,159 bp). This figure was generated using the PipMaker software [20]. 



Difficulties of de novo assembly of a plant mitochondrial 
genome 

Although the average length of the gaps between the 
contigs was only 122 bp, it was not straightforward to 
close the gaps through PCR amplification of the missing 
DNA segments. This was mainly due to misassembled, 
duplicated and repetitive DNA sequences in the nine de 
novo assembled contigs. Few reports have been published 

Table 6 Transposable elements found in the perennial 
ryegrass mitochondrial genome 

Repeat class "Poacece "Triticeae 

Fragments Length Fragments Length 

Transposable element 



1.DNA transposon 


12 


1,125 


3 


324 


-EnSpm 


3 


220 


3 


324 


-Helitron 


4 


329 






-MuDR 


2 


118 






-hAT 


1 


253 






2.LTR retrotransposon 


56 


16,758 


17 


2,566 


-Copia 


20 


4,939 


3 


153 


-Gypsy 


36 


11,819 




2,413 


3.Non-LTR retrotransposon 


27 


4,662 


2 


118 


-L1 


27 


4,662 


2 


118 


Total 


95 


22,545 


22 


3,008 



'Reference DNA; Bold face, sum of fragments and lengths. 



on mitochondrial genome sequencing using next- 
generation sequencing due to assembly difficulties of 
short reads even when a reference genome exists [6]. 

Many next-generation sequencing platforms produce 
paired-end or mate-pair reads, which collectively can be 
referred to as read-pairs. Because the approximate phys- 
ical distance of the read pairs are known, the paired nature 
of these reads constitutes a powerful source of informa- 
tion, significantly facilitating genome assembly, because 
they can span repetitive regions, and therefore can be used 
to join contigs. 

It is currently not possible to isolate nuclear DNA from 
plants without having the nuclear DNA contaminated with 
organellar DNA (mitochondrial and chloroplast). Thus, 
organellar genomes are also to a certain degree being se- 
quenced as part of a nuclear genome sequencing project, 
and prior to filtering, the perennial ryegrass genome draft 
therefore also contains mitochondrial contigs and scaffolds. 
The mitochondria-related scaffold information was used to 
re-assemble, order and orientate the mitochondrial contigs 
in our mitochondrial genome assembly, and for primer de- 
sign to facilitate PCR amplification across gaps. 

Features of the perennial ryegrass mitochondrial genome 

The final assembly of the perennial ryegrass mitochon- 
drial genome resulted in a single circular molecule of 
678,580 bp with an average G+C content of 44.1% 
(Figure 1, Table 3). The G+C content is very similar to 
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Table 7 Expression profile of the protein-coding genes in the perennial ryegrass mitochondrial genome 



Genes 


RNA-seq read counts 


Coding sequences (kbp) 


Relative expression 


nadl 


2,392 


0.975 


2,453 


nadl 


4,573 


1.464 


3,124 


nad3 


2,816 


0.354 


7,955 


nadA 


5,967 


1.485 


4,018 


nadAl 


1,391 


0.312 


4,458 


nad5 


8,251 


2.010 


4,105 


nad6 


2,426 


0.942 


2,575 


r\adl-\ 


2,766 


1.182 


2,340 


nadl-2 


2,766 


1.182 


2,340 


nad9 


6,974 


0.570 


12,236 


cob 


1 2,226 


1.191 


10,265 


cox] 


17,574 


1.986 


8,849 


cox2 


6,083 


0.780 


7,799 


cox3 


9,713 


0.840 


1 1 ,564 


atp] 


15,214 


1.527 


9,963 


atpA 


2,388 


0.585 


4,082 


atp6 


4,633 


0.711 


6,516 


atp8 


3,454 


0.465 


7,429 


atp9 


5,361 


0.222 


24,150 


ccmB 


235 


0.648 


363 


ccmC 


1,327 


0.720 


1,843 


ccmFC 


1,702 


1.431 


1,190 


can FN 


3,739 


1.767 


2,116 


rps 1 


1,645 


0.633 


2,599 


rps2 


1,635 


1.461 


1,119 


rps3 


9,172 


1.647 


5,569 


rpsA 


503 


1.032 


'18/ 


rps4-p 


424 


1.029 


412 


rpsl-\ 


205 


0.444 


461 


rpsl-2 


205 


0.444 


461 


rps\2 


5,072 


0.375 


13,526 


rps13 


388 


0.348 


1,115 


rps 14-1 


/ 


0.1 56 


42 


rps 14-2 


7 


0.156 


42 


rp/5-1 


52 


0.549 


96 


rp/5-2 


52 


0.549 


96 


rp/16 


4,963 


0.555 


8,942 


mafR 


416 


1.974 


211 


mttB 


698 


0.813 


859 


Total 


149,414 


35.514 





Gene name, RNA-seq read counts, protein-coding gene length in kbp, and the relative expression. 
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that of other sequenced plant mitochondrial genomes such 
as rice (Oryza sativa L.), 43.8%; bamboo {Ferrocalamus 
rimosivaginus L), 44.1%; sugar beet {Beta vulgaris L.), 
43.9%; melon (Cucumis melo L.), 44.5%; Arabidopsis 
(Ambidopsis thaliana L.) 44.8% and rapeseed (Brassica 
napus L.), 45.2% [5,7,21-25]. The perennial ryegrass mito- 
chondrial genome contains 73 genes including genes en- 
coding known proteins and RNAs. Among the identified 
genes, 36 genes (30 encode for proteins, and 6 for tRNAs) 
are single-copy genes. The remaining 35 genes are multi- 
copy genes of nadl , rps7, rps\% rpl5, rrn5, mzl8, rrrild, 
trnY), trnQ, trnQ, imK, trnbA, trnP, trnS, trnW and two 
pseudogenes, rps^-p and trnY-p (Table 4, Table 8). The 73 
genes gave a density of one coding region per 9.30 kbp, 
which is less compact than bamboo, rice and Arabidopsis 
(one coding region per 7.73, 7.91 and 8.0 kbp, respectively) 
[21,22,25]. Gene distribution between two DNA strands 
depends on the different genomic configuration [26], but 
generally shows no extreme strand bias [25]. In the peren- 
nial ryegrass mitochondrial genome, two protein-coding 
genes, nadl and nadl are trans-spliced, 21 genes are 
encoded on the forward strand, and 16 on the reverse 
strand. All the rRNA genes are located on the forward 
strand except the rr«26-l gene, while 15 tRNA genes are 
found on the forward strand and 13 on the reverse strand 
(Figure 1, Table 4). 

The coding and intron sequences occupy 7.2% 
(48,723 bp) and 3.9% (26,631 bp) of the genome, re- 
spectively, including 39 protein, 28 tRNA and 6 rRNA 
genes (Table 3). In the maize (Zea mays L.) NB mito- 
chondrial genome coding sequences account for 6.22% 
of the total genome [19]. Generally, the functional mito- 
chondrial rRNA and tRNA genes of the sequenced 
angiosperm mitochondrial genomes lack introns [19]. 
We found that one tRNA gene, trnL CAA contains an in- 
tron in the perennial ryegrass mitochondrial genome. 
Similar results have been found in the date palm (Phoe- 
nix dactylifera L.) mitochondrial genome where three 
tRNA genes, fr«K TTT , trnN ATT and trnSup CTA , also 
contained an intron [28]. Further work is needed to de- 
termine if the perennial ryegrass intron containing tRNA 
gene, tr«L CAA is functional. 

The variation in the number of mitochondrial genes 
between species is mainly due to differences in gene 
content for the subunits of complex II, and especially 
ribosomal proteins and tRNAs [18]. Multiple copies of 
rRNA genes were found in perennial ryegrass (Table 8), 
and also in the mitochondrial genomes of sugar beet and 
wheat (Triticum aestivum L.) [23,27]. All the known re- 
spiratory genes, except for the complex II genes sdh3 
and sdh4, are present in the perennial ryegrass mito- 
chondrial genome (Table 4). Both sdh3 and sdhQ genes 
are functional in tobacco and melon [5,29] but absent in 
all other species, as reviewed by Ma et al. [22]. Sdhl has 



Table 8 Copy number of mitochondrial genes that differ 
in perennial ryegrass, bamboo, wheat, rice and maize 



Genes 


Perennial ryegrass 


Bamboo 3 Wheat 3 


Rice 3 


Maize 3 


atpl 


1 


1 1 


2 


2 


atpA 
atp6 


1 
1 


1 1 
1 2 


2 
1 


1 

1 


atp8 


1 


1 2 


1 


1 


cox3 


1 


1 1 


2 


1 


nad]a 


1 


1 1 


2 


2 


nad2c 


1 


1 1 


2 


1 


nad2d,e 


1 


1 1 


2 


2 


nadAA 


1 


1 1 


3 


1 


nad5a,b 


1 


1 1 


2 


1 


nadl 


2 


1 1 


1 


1 


nad9 


1 


1 1 


2 


1 


m/2 


0 


0 0 


3 


0 


rp/5 


2 


1 1 


2 


0 


rpsl 


1 


1 1 


1 


2 


rpsis 


1 


1 1 


1 


2 


rpsl 


2 


1 1 


1 


1 


rps]4 


2 


0 0 


0 


0 


rpsl 9 


0 


0 0 


1 


0 


rrnS 


2 


1 3 


2 


1 


irn 18 


2 


1 3 


2 


1 


rrn26 


2 


1 2 


2 


1 


trnC 


2 


0 0 


0 


0 


trnD 


3 


1 2 


1 


2 


trnB 


1 


1 1 


1 


2 


trnfM 


1 


1 3 




1 


trnl 


0 


1 1 




2 


trnK 


2 


1 3 




1 


trnM 


2 


0 1 




0 


fmN 


1 


0 0 




1 


trnP 


3 


1 2 




2 


trnQ 


4 


1 3 




1 


trnW 


2 


0 0 


0 


0 



Gene fragments, pseudogenes and chloroplast-derived genes are excluded. 
Genes with the same number of copies in all five species are not included. 
"From bamboo (Acc. No. JQ235166 to JQ235179) [22], rice (Acc. No. BA000029) 
[21], maize (Acc. No. AY506529) [19], and wheat (Acc. No. AP008982) [27]. 
Lower case letter indicates exons of this gene. 



been identified as a pseudogene in Arabidopsis, rapeseed 
and sugar beet [7,23,25]. Although the perennial rye- 
grass mitochondrial genome contains some multi-copy 
genes, it lacks the rpl2, rpslO, rpsll and rps\9 genes 
(Table 4, Table 8). Rpl2 is missing in sorghum (Sorghum 
bicolor L.), Tripsacum, maize and sugar beet [22]; it is 
functional in Arabidopsis, rice, rapeseed, tobacco and 
melon [5,7,21,25,29]; and it is a pseudogene in wheat 
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and bamboo [22,27]. RpslQ is only present in tobacco 
and melon [5,29]. The rpsll gene is present in liverwort 
{Marchantia polymorpha) but has not been found in 
eudicotyledonous and monocotyledonous species ex- 
cept for the rice mitochondrial genome which retains 
rpsll as a pseudogene [21,30]. The rpsl9 gene is func- 
tional in rice, bamboo, and tobacco, and has been iden- 
tified as a pseudogene in wheat and Arabidopsis. In the 
perennial ryegrass mitochondrial genome, we found 
multiple copies of the rpsll gene, which has only been 
identified as a functional gene in rapeseed [7] . The com- 
parison of all 14 ribosomal protein and both complex II 
(sdh) genes in 280 diverse angiosperms has demon- 
strated frequent loss of some of these 16 mitochondrial 
genes during angiosperm evolution [31]. It seems that 
genes encoding ribosomal proteins and complex II pro- 
teins are species-specific. In order to understand the 
gene loss and gain event in the angiosperm mitochon- 
drial genome, it might be interesting to know the com- 
pensation pathway of the genes which are missing in the 
mitochondrial genome. The compensation pathway 
might be the first reason that genes which are no longer 
necessary to function in the cell can disappear entirely 
from the cell. The second reason is gene substitution or 
gene replacement, where the function of the missing 
mitochondrial gene is still needed and is directly re- 
placed by a preexisting nuclear gene whose product can 
play the same role in the mitochondrion [32]. 

The perennial ryegrass mitochondrial genome has 3 
rRNA genes, rrnl8, rrn5 and rr«26, encoding the small 
subunit and large subunit rRNAs, which are present in all 
characterized mitochondrial genomes. The rrn5 gene is 
very small (122 nucleotides). In contrast to the rRNA genes 
in the mitochondrial genome, the mitochondrial 5S rRNA 
gene is absent in the mitochondrial genome of some fungi, 
animals and protists [3], and present only in the lands 
plants, a subset of algae [33] and in the protozoan [34]. 

Plant mitochondrial genes are translated using the uni- 
versal genetic code and require tRNAs for all 20 amino 
acids, and the composition of the tRNA genes, encoded 
by the plant mitochondrial genomes, are to a high de- 
gree unique in angiosperms. In the perennial ryegrass 
mitochondrial genome, 27 functional tRNA genes are 
found for 14 amino acids. One pseudogene, trn¥-p re- 
mains non-functional in the genome. Post-transcriptional 
modification within the anticodon sequence might be 
necessary to generate a functional tr«F-p gene. Thus, 
functional tRNA genes for six essential amino acids, Ala, 
Arg, Gly, He, Thr and Val, are missing from the perennial 
ryegrass mitochondrial genome (Table 5), although tRNAs 
for 20 amino acids are required for protein synthesis in 
mitochondria. The missing six are presumably encoded by 
the nuclear genome and imported from the cytosol into 
the mitochondria [35,36]. The tRNA gene content of the 



perennial ryegrass mitochondrial genome was compared 
with eight other grass mitochondrial genomes, and differ- 
ences were observed with respect to presence or absence 
of tRNAs for Ala, Arg, Gly, He, leu, Thr, Trp and Val 
among these plant species. Plastid-derived tRNAs were 
not considered in the comparison (Table 5). In the peren- 
nial ryegrass mitochondrial genome, twenty-four tRNAs 
display a classical clover leaf structure, whereas each of the 
two tRNA-Ser (tRNA-Ser TGA and tRNA-Ser GCT ) fold into 
an unusual four-loop secondary structure. One of the 
tRNAs (tRNA-Tyr GTA ) has a two stem clover leaf structure. 
In the maize NB mitochondrial genome, tRNA-Ser GCU and 
tRNA-Ser UGA have a five loop secondary structure [19]. 

In addition to protein and RNA-coding genes, we 
identified 149 ORFs larger than 300 bp in the perennial 
ryegrass mitochondrial genome (Additional file 1: Table 
SI). Only ORFs found outside the genie regions of the 
mitochondrial genome were included in the analysis. 
The number of ORFs larger than 300 bp identified in 
the perennial ryegrass mitochondrial genome are com- 
parable to previously reported for other species such as 
maize (121), sugar beet (93), Arabidopsis (85), rice (461), 
wheat (179) and tobacco (110) [19,24,25,27,29,37]. 

Gene content in the repetitive regions 

The mitochondrial genome of perennial ryegrass con- 
tains multiple copies of the genes nadl , rpsl, rpsll, 
rpl5, rrn5, mi 18, rrnld, trnD, trnC, trnQ, trnK, trnbA, 
trnV, trnS and trnW (Table 4, Table 8). All of the multi- 
copy protein genes are located in the inverted repeat 
regions, and multi-copy RNA genes are located in both 
repeat and inverted repeat regions (Figure 2). As for 
trnP, two copies were identical, whereas the third copy 
differed by a single nucleotide. The trnM-1 also differed 
from trnM-2 by a single nucleotide. Similarly, as for 
trnQ in the wheat mitochondrial genome, two copies are 
identical, whereas the third copy differed by a single nu- 
cleotide [27]. A comparison of multi-copy mitochondrial 
genes among grass genomes such as ryegrass, bamboo, 
wheat, rice and maize (Table 8), suggests that gene du- 
plication is a species-specific phenomenon [27]. The 
large repeated sequences covers 17.35% of the maize NB 
mitochondrial genome [19], while such sequences cov- 
ered 17.12% in perennial ryegrass. 

Splicing 

Splicing is often part of post-transcriptional modification 
of messenger RNAs (mRNAs). It involves the excision of 
non-coding intron sequences from a precursor RNA and 
subsequently ligation of the flanking exon sequences to 
produce a mature transcript. Two types of splicing were 
found in the perennial ryegrass mitochondrial genome: 
cis-splicing, the intramolecular ligation of exon sequences 
on the same precursor RNA, and trans-splicing involving 
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Oryza_rps4 

Tr iticum_rps4 

Perrocal anus_rp 



MPAFP. FKTCRLLPGNVRNRELSL IQRRILRRLRNKRRS IKRNLSOjRS NLN 
MP AFRPKTCRLLPGNVRNRELSL IQRRILRRLRNKRRS IKRNLSflRBNLN 
MP ALRPKTCRLLPGNVRNRELSL IQRRILRRLRNKRRS IKRNLSQRBNLN 
MP ALRFKTCRLLPGNVRNRBLSL IQRRILRRLRNKRRS IKRNLSKRBNLN 
MP ALRFKTCRLLPGNVRNRBLSL IQRRILRRLRNKRRS I KRNLSQRB NLN 
MPALRPKTCRLLPGNVRNRBLSL IQRRILRRLRNKRRS IKRNLSRRBNLN 
MPALRPKTCRLLPGNVRNRBLSL IQRRILRRLRNKRRS IKRNLSfRBNLN 
1 10 20 30 40 

SNIKSQTTRKLSIYYGDLPIRBMHRGR8RTSYIPPLLNQETRSDVIPVRL 
SNIKSQTTRKLSIYYGDLPIRBMHRGR8RTSYIPFLLNQBTRSDVIPVRL 
SNIKSQTTRKLSIYYGDLPIRBMHRGRBRTSYIPPLLNQETRSDVIP VRL 
SNIKSQTTRKLSLYYGDLPIRKNHRGRERTSYIPFLLNQBTRSDVIPVRL 
SNIKSQTTRKLSLYYGDLPIRBMHRGRERTSYIPPLLNQETRSDVIPVRL 
SNIKSQTTRKLSLYYGDLPIRBMHRGRERTSYIPPLLNQETRSDVIPVRL 
SNIKLQTTP.KLSLYYGDLPIP.BHHRGRBR TSY I PPLLNQBTRSD VIP VP.L 
51 60 70 80 90 

HFSDT LPQARQP ISHRR VCLNNGLVTlTHXiKVSHGDLISFKBNDARTRGB 
HFSDT LPQARQP ISHRR VCLNNGLVT I THLKVSHGDLISFKBND ART RGB 
RFSDT LPQARQP ISHRR VCLNNGLVT I THPKVSHGDLISFKBND ART RGB 
RFSDT LPQARQP ISHRR VCLNNGLVT I THPKVSHGDLISFKBND ART RGB 
HF CD T LPQARQP ISHRR VCLNNGLVT I THLKVSHG DLI SFQBND ART RGB 
HFSDT LPQARQP ISHRR VCLNNGLVT I THIKVSHGDLISFJCBND ART RGF 
HFSDT LPQARQP ISHRR VCLNNGLVT I THLKVSHGDLISFQBND ART RGB 
101 110 120 130 140 

BIRRSPYIDIS VGKIWG KPLPAKIWP.RT KTBWPRLLT 

BIRRSPYIDIS VGKIWG KPLPAKIWP.RT KTBWPRLLT 

BIRRSPYIDIS VGKIIG KPLPWRIWRRT KTBWPRLLT 

BIRRSPYIDIS VGKIIG KPLP VRIWRRT KTBWPRLLT 

BIRRSPYIDIS VGKIIG KLLPWRIWRKT KTBWFRLLT 

BIRRSPYIDIS VGKIIG KPLSAI SVGKRRGKPLP ARIWP.RT KKBWFRLLT 

BIRRSPYIDIS VGKIIG KPLPVRIWRRT KTBWPRLLT 

151 160 170 160 190 



Lo liuiu_rps4 

Lo lium_rps4 -p 

Zea_rps4 

So rghura_ rps4 

C>ryza_rps4 

Triticura_rps4 

Perrocal aiaus_rp 



TQRGCRLLLKS 
TQROCRLLKKS 
TQRGCRLLLKS 
TQRGCRLLLKS 
TQRVCRLLLKS 
TQRGCRLLLKS 
TQROCRLLLKS 
201 21 



LFLQKLRSYMQBBDFBRT 
LPIWILKALANLYLPGSA 
GFLQB I.RSYNQB BDLBRT 
GFLQS l.RSYMQE BDLBRT 
WFLQBLRSYHQS BDLBRT 
KBLQKLRSYMQBBDFBRT 
WPLQBLRSYMQB BDLBRT 
0 220 



KKPG 

PLVG 

KKPG 

KKPG 

KKPG 

KKPG 

KKF 

230 



SAKVCLGS 
LP VGLLAA 
SAKVCLGS 
SAKVCLGS 
SAKVCLGS 
SAKVCLGS 
SAKVCLGS 
24 



SFABHNRMK 
SSTAMAAGP 
SFABHNRMK 
SFABHNRMK 
SFABHNRMK 
SFABHNRMK 
SFABHNRMK 
0 



Lo 1 ium_rps4 

Lo liura_rps4 -p 

Zea_rps4 

S o rghura_ rps 4 

Oryza_rps4 

Tr iticun_rp»4 

Perrocal amus_rp 



RNLFHPKYPPLLKP.GK3KN 

PDWMTGDPNBRLLRKTBKQ 

RNLFHPKYPPLWKP.RKBBB BGVB 

P.NLFH FKYFFL WKP.P.K3 BBBLIP. 7IGBAS WKRRKBBBBLIRRKBBBBGVB 

RNLFHPKYPPLLKP.RKBBB 

RNLFHPKYPPLLKRGKBKN 

RNLPHPKYPPLLKP.GK3KN 

251 26 0 270 260 290 



Lo liu»_rps4 

Lo 1 iui_rps4 -p 

Zea_rps4 

S o rghum_ rps 4 

Orysa_rps4 

Tr iticum_rps4 

Ferrocal araus_rp 



Lo liun_rps4 

Lo 1 iu«_r ps4 -p 

Zea_rps4 

S o rghuiu_ rps 4 

Oryza_rps4 

Tr iticura_rps4 

Perrocal amus_r p 



RNLPTRTISPF VYKSSLYSNS TYCSGS PP — TRKIRIKR IBL 

LLKVSBBLDQVTAQAVANAQLPPLPLPGTNA ABQADT IRS I LB HDL 

BILKVIBBNENRKRAISPFVYTNKLYRNSTYCSGSPFT I TRKIRIKR IBL 
B ILKVISBNBNRKRAISPF VYTKKLYRNS TYCSGS PF — TRKIRIKR IBL 
BLIRT IGBABNRKRAISPF VYKSSLYP.NS TYCSGS PP — TRKIRIKR IBL 

RNLP TRT IS PPVBKSSLYSNS TYCSGS PF- -TRKIRIKR IB L 

RNLP TRT IS PP VYKSSLYSNS TYCSGS PF- -TRKIRIKR IBL 

301 310 320 330 34 0 

P-THYSBVNHP. TLK AVVSYGPNI GHIPHD IRLKDPNLPLRSGNGRGQNI 

DGIHLNQRIRR IRKWLQPGB IDNSBSLPWVQI IDQLNRLMP 

P-THYSBVNHP. TLKAVVSYGPNIGHIPHD IRLKD P NLPLRSGNGRGQNI 
P-THYSBVNHP. TLKAVVSYGPNIGHIPHD IRLK3 P NLPLRSGNGRGQNI 
P-THYSBVNHP. TLKAVVSYGPNIGHIPHD I RLKD P NLP LRS GNGRGQ N I 
P-THYSBVNHP. TLKAVVSYGPNIGHIPHD IRLKDPNLPLRSGNGRGQNI 
P-THYSBVNHP. TLKAVVSYGPNIGHIPHD I RLK 3 P NLP LRS GNGRGQ N I 
351 36 0 370 380 390 



Figure 3 Amino acid sequence alignment of the perennial ryegrass rpsA and rps4-p genes with the rps4 gene of maize, sorghum, rice, 
wheat and bamboo. Amino acid sequence of the rpsA gene of maize (Ace. No. YP_588274.1), sorghum (Acc. No. YP_762343.1), rice (Acc. No. 
YP_514660.1), wheat (Acc. No. ADE08097.1) and bamboo (Acc. No. AEK66732.1). Color code: white, conserved; green, identical; cyan, similar and 
yellow, different residues. This alignment was generated using the SDSC Biology WorkBench [42]. 



the intermolecular ligation of exon sequences from differ- 
ent primary transcripts [38]. Trans-splicing is characteris- 
tic for angiosperm mitochondrial introns, particularly for 
genes encoding complex I sub units [18]. Of 26 introns of 



the protein-coding genes, only four were trans-spliced 
(nadl and nad2), confirming that trans-slicing is less com- 
mon than cis-spicing [38]. In the perennial ryegrass mito- 
chondrial genome, two genes nadl and nadl, encoding 
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proteins of NADH dehydrogenase subunits 1 and 2, were 
each split into 5 exons. In nad\, exon nadla was located 
far from the other four exons on the other stand. In case 
of the nad.2 gene, exons nad2a and nad2b were found ap- 
proximately 265 kb from the other three exons on the 
other strand. 

Genome diversity 

The mitochondrial genomes of flowering plants are 
more complex than those of animal and fungi [39,40]. 
They extensively vary in size (ranging from 208 kbp 
in white mustard [4] to over 2,700 kbp in muskmelon 
[5]), gene content, genome rearrangement patterns 
and presence of repetitive sequences. Multiple copies 
of a few of the conserved full-length genes or exons 
are found in the mitochondrial genome (Table 8), 
which has also undergone size expansion when com- 
pared between plant species. In the perennial ryegrass 
mitochondrial genome, only 142,764 bp (21.03%) of 
the total DNA sequences encode proteins, RNAs and 
ORFs (Table 3), while the vast majority of the gen- 
ome sequence has unknown function. 

In the perennial ryegrass mitochondrial genome, the 
rps4?-p and rps4? genes are conserved to each other in the 
5' end (599 nucleotides with a 99% sequence identity) 
but they do not share the sequence. The BLASTn result 
confirmed that the 3' end of the rps4?-p gene has no 
homology with rps\ gene of other species reported so 
far. Thus, the rpsA-p gene might be a variant of the ribo- 
somal gene, rps^ or a pseudogene in the perennial rye- 
grass mitochondrial genome. The amino acid sequences 
of rps4 and rpsA-p genes in perennial ryegrass were 
aligned with the amino acid sequences of the rps4 gene 
of maize (Acc. No. YP_588274.1), sorghum (Acc. No. 
YP_762343.1), rice (Acc. No. YP_514660.1), wheat (Acc. 
No. ADE08097.1) and bamboo (Acc. No. AEK66732.1), 
(Figure 3). The alignment showed that only 196 amino 
acids (45%) of the rps4-p gene are conserve to the rps^ 
gene. Transcriptome analysis confirmed that both rps^ 
and rpsi-p are expressed in the reproductive tissue of 
the perennial ryegrass. Both rps4: and rps4-p genes had 
low normalized expression pattern (< 1,000 reads per kbp 
length) (Table 7). In addition, we also found two riboso- 
mal protein genes, rps3 and rpll6, sharing 110 bp of se- 
quence between them (Figure 1, Table 4). The shared 
sequence was located in the second exon of rp&i (rpsSo) 
and at the beginning of the rpll6 gene. Similarly, in the 
wheat KS3-type mitochondrial genome, /CSor/1484 has 
46 bp shared sequence with the cob gene [41]. 

Plant mitochondrial genomes contain TEs, DNA se- 
quences that can move from one position to another. 
TEs can constitute an appreciable fraction in the gen- 
ome and are found in most species with the exception of 
liverwort [33]. In the perennial ryegrass mitochondrial 



genome, we found 22 and 95 TE fragments of various 
sizes covering a total of 3,008 and 22,545 bp based on 
comparison to the Triticeae and Poaceae databases, re- 
spectively (Table 6). The sequences varied in length and 
the elements were dispersed in the perennial ryegrass 
mitochondrial genome (Additional file 5: Table S5). The 
TEs appear to be more abundant in grasses than cereals. 
In the perennial ryegrass mitochondrial genome only 
0.44% or 3.32% of the genome showed homology with 
transposable like elements, most of which resemble 
retrotransposable elements, suggesting that their contri- 
bution to the expanded perennial ryegrass mitochondrial 
genome is negligible. 

Gene expression profile 

In this study, the expression profile of perennial ryegrass 
mitochondrial genes was studied in reproductive flower 
tissues (Table 7). In reproductive tissues, the mitochon- 
drial density and activity is especially high presumably 
because the energy and biosynthetic requirement is par- 
ticularly high during reproduction, i.e., during pollen 
development [43,44]. The generally high expression level 
of all of the encoded subunits of the respiratory complexes 
in the mitochondrial genome (Table 7) is consistent with 
that observation. However, there were some variations 
between the expression levels of the complexes: 

- Complex I, the NADH dehydrogenase, contains 
around 40 subunits, one copy each per complex, in 
higher plants [45] where nine are mitochondrially 
encoded in perennial ryegrass (Table 4). The 
normalized expression levels for all of the complex I 
subunits were generally high and varied about 3-fold 
(Table 7). 

- Complex IV, cytochrome c oxidase, contains 12-13 
subunits out of which three are mitochondrially 
encoded in most plants including perennial ryegrass 
(Table 4). All three subunits were highly expressed 
(Table 7). 

- Complex V, the ATP synthase, consists of about 15 
subunits, five of which are encoded by the 
mitochondrial genome in perennial ryegrass 
(Table 4), while the remaining are encoded in the 
nucleus, synthesized on free ribosomes in the 
cytosol and imported into the mitochondria to be 
assembled with the mitochondrially encoded 
subunits into a complex in the inner membrane. 
The normalized expression levels of all the ATP 
synthase (complex V) genes was relatively high in 
perennial ryegrass reproductive tissues and varied 
<3-fold with the exception of atp9 (encoding 
subunit c of the Fo), which had a 2.5 times higher 
expression than the second highest, which is atpl 
(encoding subunit alpha of the Fl) (Table 7). This 



Islam et al. BMC Genomics 2013, 14:202 
http://www.biomedcentral.com/1471-2164/14/202 



Page 16 of 21 



may be significant given that subunit c of the 
complex is present in 10-15 copies per complex and 
the alpha subunit is present in three copies while all 
the other mitochondrially encoded subunits are only 
present in one copy each per complex V [46]. Thus, 
the mRNA levels in perennial ryegrass mitochondria 
as expressed by the normalized read numbers are 
correlated with the biosynthetic requirement for 
complex V subunits. Previous studies have shown 
that the atpl gene is highly expressed in male flower 
of date palm, and in pollen mother cells of 
Arabidopsis [28,47]. 
- Finally, the four genes encoding proteins involved in 
cytochrome c biosynthesis, ccmB, ccmC, ccm¥C and 
ccmYN, were all expressed, but not at particularly 
high levels (Table 7). An earlier microarray analysis 
of mitochondrial gene expression at the early stage 
of wheat shoot tissues reported that the ccwFN gene 
showed increased transcript level under three 
different stress conditions, low temperature (4°C), 
high salinity (0.2 M NaCl) and high osmotic 
potential (0.3 M mannitol) [48]. 

Protein biosynthesis in plant mitochondria takes place 
on bacterial-type ribosomes, where 14 of the subunits are 
encoded in the perennial ryegrass genome (Table 4). Out 
of these, eight had low normalized expression levels 
(<1,000 reads per kbp length), while three {rps?>, rpsYl and 
rpl\6) showed high expression levels (>5,000 reads per kb 
length) (Table 7). Four of the ribosomal protein genes 
were hardly expressed at all (rj?sl4-l, rpsl4-2, rplS-1 and 
rpl5-2) which may be because they are non-functional, or 
because they are required in other tissues, but not in the 
reproductive tissues in perennial ryegrass. Consistent with 
the latter hypothesis, two ribosomal protein genes, rpsl 
and rpsl9, are much more abundantly expressed in roots 
than in other tissues of date palm [28] . 

Transcription does not necessarily lead to protein 
synthesis. An astonishing 48.5% and 30.8% of the total 
mitochondrial genomes of rice and date palm, respect- 
ively, are transcribed, which is due to RNA synthesis 
from large parts of the regions outside the annotated 
genes. For comparison, only 6.5% of the date palm 
mitochondrial genome is translated into proteins 
[28,37]. The functions of the transcribed inter-genic 
regions of plant mitochondrial genomes are not well 
understood. 

The expression of the mitochondrial genes in plants is 
carried out by phage-type RNA polymerase encoded in 
the nuclear genome [49]. The process of gene expression 
in plant mitochondrial DNA is rather complex, 
influenced by multiple promoters, RNA processing and 
particularly at the post-transcriptional processes, splicing 
and editing [50-52]. Prior to transcription, the RNA 



polymerase is capable of promoter recognition, initi- 
ation, and elongation on their own but need auxiliary 
factors to recognize all transcription initiation sites [53]. 
The sequence analysis of the Arabidopsis mitochondrial 
genome showed that potential promoter motifs exist in 
the inter-generic regions [54]. In addition, a number of 
annotated genes do not show potential promoter se- 
quences confirming the possibility that other sequences 
can initiate transcription [55]. For this reason, transcrip- 
tion is actually initiated from a variety of promoter sites 
in the genome [50,55]. Thus, transcription in plant mito- 
chondria produces both cryptic transcripts from regions 
that do not contain genes or from the opposite DNA 
strand of the genes; as well as defective transcripts en- 
coded by the genes but failing to complete the complex 
post-transcriptional process to become functional tran- 
scripts [56]. Once initiated, transcription sometimes give 
rise to extremely large transcripts due to the absence of ef- 
ficient transcription termination in plant mitochondria 
[57]. This contributes significantly to the transcription of 
the inter-genic regions. Therefore, large portions of the 
mitochondrial genome are transcribed but not translated 
into proteins. 

Conclusions 

For the first time, the mitochondrial genome of peren- 
nial ryegrass has been sequenced, successfully assembled 
and annotated. The data presented here constitute a pri- 
mary platform to understand the organization and func- 
tion of the mitochondrial genome in one of the most 
important forage and turf grass species. The circular 
mitochondrial master molecule will be useful for com- 
parative mitochondrial genomics and for future research 
on agronomically important traits such as CMS. 

Perspectives 

Perennial ryegrass is a dominant forage species in the tem- 
perate regions worldwide, and its main role is to provide 
forage to the ruminant animals. Eighty per cent of the 
worlds cow milk and 70% of the world's beef and veal are 
produced from temperate grasslands [58]. A major portion 
of these grasslands is covered by perennial ryegrass, which 
is, however, not well adapted to regions with severe winter 
or hot summer [58], unless the geographic range of the spe- 
cies can be extended by developing more robust cultivars. 
One way to increase productivity, nutritional quality and 
tolerance towards biotic and abiotic stress is to maximize 
the genetically available heterosis using hybrid breeding 
schemes. However, hybrid seed production requires a tool 
to efficiently control pollination, a tool such as CMS. The 
mitochondrial genome is a key to understanding the 
origin and function of CMS and will - in the long term - 
facilitate the development of hybrid cultivars in allogam- 
ous forage grasses. 
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Methods 

Plant material 

The perennial ryegrass genotype Fl-30 was used for mito- 
chondrial genome sequencing. Fl-30 was developed from a 
cross between a genotype of the Italian cultivar Veyo and 
the Danish ecotype Falster [59]. The Fl-30 genotype was 
multiplied by clonal propagation and grown in 15 cm x 
15 cm plastic pots in the greenhouse in order to develop 
plant material for mitochondrial DNA (mtDNA) extraction. 

Isolation of intact mitochondria 

The plants were kept in darkness for 24 h prior to mito- 
chondrial isolation in order to reduce the amount of 
starch in the chloroplasts. For each batch of mtDNA 
isolation a total of 30 g young leaves were collected from 
4-month-old clones of Fl-30 to isolate intact mitochon- 
dria. All the equipment and buffers were kept at 4°C 
before the extraction, and all the steps were conducted 
on ice or at 4°C. 

The leaves were cut into small pieces (5-10 mm) with 
scissors. Thirty g leaf pieces were homogenized for 1 min 
in 300 ml extraction buffer containing 0.3 M mannitol, 
5 mM EDTA, 30 mM MOPS (pH 7.3 adjusted with 1 M 
KOH), using a chilled mortar and pestle. The reagents 
0.2% (w/v) BSA, 5 mM DTT and 1% (w/v) PVPP, were 
added to the extraction buffer prior to use. The crude 
homogenate was filtered through two layers of cotton 
cloth followed by centrifugation at 2,000 g for 10 min to 
pellet starch, nuclei and chloroplast. The recovered super- 
natant was centrifuged at 10,000 g for 15 min to pellet in- 
tact mitochondria. 

The mitochondrial pellet was resuspended in 7 ml 
DNAse-I buffer containing 0.44 M sucrose, 50 mM Tris- 
HC1 (pH 8.0) and 10 mM MgCl 2 . Eight mg of DNAse-I 
recombinant, grade I (Roche, Mannheim, Germany) was 
dissolved in 1 ml DNAse-I buffer, and added to the mito- 
chondrial suspension to give a final concentration of 
1 mg/ml DNAse-I. Digestion was allowed to continue on 
ice for 2 h to degrade any nuclear and chloroplast DNA 
present outside the mitochondria. The digestion was 
terminated by adding 0.5 M EDTA (pH 8.0) to a final 
concentration of 25 mM. Mitochondria were re-pelleted 
at 16,000 g for 10 min. The pellet was resuspended in 
25 ml of wash buffer (0.3 M mannitol, 1 mM EDTA, 
10 mM MOPS, pH 7.2 adjusted with 1 M KOH). Intact 
mitochondria were washed twice by resuspension in 
wash buffer and re-pelleting at 16,000 g for 10 min. 

Extraction, purification and precipitation of mtDNA 

The washed mitochondrial pellet was lysed by suspen- 
sion in 2 ml lysis buffer containing 10 mM Tris-HCl 
(pH 8.0), 10 mM NaCl and 1 mM EDTA (pH 8.0), 
followed by the addition of 10% SDS to a final concen- 
tration of 1% (v/v) and incubation at 37°C for 30 min. 



DNA was purified according to the standard method 
[60] with slight modifications. An equal volume of phe- 
nolxhloroform (25:24 v/v) was added to the sample and 
centrifuged at 20,800 g for 15 min at room temperature. 
The aqueous phase was transferred to an eppendorf 
tube, and two additional cycles of phenolxhloroform ex- 
traction and two cycles of chloroform extraction were 
performed. The purified DNA was precipitated by 
adding 0.1 volume of 3 M sodium acetate (pH 5.5) and 2 
volumes of cold (4°C) absolute ethanol (99.9%) to the 
purified DNA. The mixture was vortexed briefly and in- 
cubated at -20°C overnight. The precipitated DNA was 
recovered by centrifugation at 4°C at 20,375 g for 15 min. 
The ethanol was removed by decantation and 300 ul ice 
cold 70% (v/v) ethanol was added and centrifuged again at 
4°C at 20,375 g for 5 min. The ethanol was removed and 
the pellet was air dried and resuspended in sterile Tris/ 
EDTA buffer (pH 8.0) followed by an equal volume of R40 
(40 ug/ml RNAse A (Roche, Mannheim, Germany) in 
Tris/EDTA buffer). The mtDNA was checked for quality 
by gel electrophoresis on a 1.5% agarose gel in lx TAE 
buffer (Additional file 6: Figure 1). DNA from two batches 
of mtDNA isolation was pooled in order to obtain a suffi- 
cient amount for sequencing. 

Monitoring mitochondrial purity and intactness and 
protein concentration 

During the mitochondrial preparation, we kept 1 ml sam- 
ples from each preparation step (homogenate to super- 
natant) to be used for the determination of enzyme 
activation and protein concentration. The activity and la- 
tency of cytochrome c oxidase (CCO), an inner membrane 
enzyme, was measured at 550 nm in an assay medium 
containing 0.3 M sucrose, 50 mM Tris, 100 mM KC1 and 
45 uM reduced cytochrome c, pH adjusted to 7.2 using 
1 M acetic acid plus or minus 0.05% (w/v) Triton X-100. 
The activity and latency of NAD + -dependent malate de- 
hydrogenase (MDH), a matrix enzyme, was measured at 
340 nm in the cuvette using 1 ml assay medium 
containing 0.3 M sucrose, 20 mM MOPS-KOH, pH 7.0, 
20 ul of 100 mM oxaloacetate, pH 7.0, 5 pi of 200 mM 
salicylhydroxamic acid, 2 pi of 0.2 mM antimycin and 4 pi 
of 50 mM NADH plus or minus 0.05% (w/v) Triton X- 
100. In both cases the enzyme latency was calculated as, 

Percentage intact = ([(rate + Triton) 

— (rate — Triton)}/ (rate + Triton)) 

xl00% 

[14]. The latency of CCO activity is a measure of the in- 
tegrity of mitochondrial outer membrane as the substrate, 
reduced cytochrome c, can not penetrate an intact outer 
membrane to reach the active site on the outer surface of 
the inner membrane. The latency of MDH activity is a 
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measure of the integrity of the mitochondrial inner 
membrane as NADH can not cross an intact inner 
membrane to reach the enzyme present in the mito- 
chondrial matrix [14,15]. 

The protein concentration in the various fractions was 
measured at 562 nm using Bicinchoninic acid (BCA) 
protein assay kit (Sigma) containing BCA working re- 
agent, BSA protein standard and 5% deoxycholic acid as 
recommended by the manufacturer. 

Library preparation and next-generation sequencing 

A sequencing library for Fl-30 was prepared according to 
the Rapid Library Preparation Method Manual October 
2009 (Rev. Jan 2010) using 500 ng mtDNA. Sequencing 
was performed on a Roche 454 GS-FLX Titanium instru- 
ment (software version 2.3) following the manufacturer's 
recommendations. 

De novo assembly of the mitochondrial genome 

Adaptor removal, quality filtering (quality score 99.8%), and 
reference assembly against the chloroplast genome was 
performed using the CLC Genomics Workbench software 
(v.5.0). The chloroplast reads were removed by performing 
a reference assembly against the perennial ryegrass chloro- 
plast genome (GenBank Acc. No.: NC_009950.1) using 
the parameters: similarity, 0.98; conflict resolution, vote 
(A, C, G, T); non-specific matches, random and 
masking of references, none. Reads mapping to the 
chloroplast genome were subsequently removed from 
the dataset. The sequence reads were assembled into 
contigs using the CLC Genomics Workbench software 
(v.5.0) with the parameters: Mismatch cost, 2; Insertion 
cost, 3; Deletion cost, 3; Length fraction, 0.5; Similarity 
fraction, 0.99. Gap closure and manual inspection and 
editing were performed using the SeqMan software 
(v.5.0. 3). Mitochondrial genome contigs were identified 
by BLASTn (E-value lxl0~ 10 ). The PipMaker software 
[20] was used to identify repetitive regions within and 
among the mitochondrial contigs in order to facilitate 
primer design. DNASIS Max (v.2.9.) was used to blast 
in-house against the perennial ryegrass genome scaf- 
folds in order to validate the order of the contigs in the 
mitochondrial genome. Primers were designed using 
the Primer3 software (v.0.4.0). Genomic DNA of Fl-30 
was used as template for PCR to amplify the gaps between 
contigs. The purified PCR products were sequenced by 
Eurofins MWG Operon (Ebersberg, Germany). The gap 
sequences were incorporated into the assembly using the 
SeqMan software. 

Genome annotation and analysis 

Annotation of the mitochondrial genome was done 
using the Maker2 pipeline [61]. In a first round of 



analysis we used an in-house assembly of the Fl-30 
transcriptome (unpublished) as initial evidence for gene 
prediction. We also utilized a collection of plant mito- 
chondrial protein sequences from various organisms 
included in the genome annotation software package 
Mitofy for plant mitochondria (http://dogma.ccbb.utexas. 
edu/mitofy). Repeat masking was performed with a grass- 
specific repeat database from RepBase [62] . After an initial 
round of gene prediction, a training file was generated for 
the ab initio gene predictor SNAP, and an additional 
round of gene prediction was performed. The resulting 
GFF3 file was loaded into Apollo [63] for visualization and 
manual curation of the genes after taking all the available 
evidence into account. Structural RNA genes were identi- 
fied using tRNAscan-SE 1.21 (for tRNAs) and RNAmmer 
1.2 for rRNAs [64,65]. Search for ORFs was performed 
using CLC Genomics Workbench (v.5.0). 

Sequence repeats were investigated using PipMaker 
with default parameter settings [20]. SIR were detected 
using the inverted repeat finder software [66] (match 2; 
mismatch 3; delta 5; match probability 80; indel prob- 
ability 10; minimum alignment score 40; maximum 
length to report 100,000; maximum loop 100,000; max- 
imum loop separation for tuple of length 4). Tandem re- 
peats were detected using the Tandem Repeat Finder 
(v.4.04) developed by [67] with parameters (alignment 
parameters [match, mismatch, indel; 2,7,7], min. align, 
score 50; max. period size 2,000). SSRs were identified 
using the msatcommander 0.8.2-WINXP.Zip software 
package [68]. The parameters for SSR detection were 
1- to 2-nucleotide (nt) repeats of at least 10 nt length 
and 3- to 6-nt repeats with at least three repeat units 
(Additional file 4: Table S4). 

TEs were identified using CENSOR (with default param- 
eter settings, using Poaceae and Triticeae as reference [62]. 

RNA preparation and sequencing 

For transcriptome analysis, pollen and stigma tissue 
samples were collected from the Fl-30 genotype grown 
under standard growing conditions in a greenhouse. A 
sealed paper bag was put over the inflorescences for 
8 hours at anthesis to collect the pollen. The pollen 
was harvested after 8 hours, frozen in liquid nitrogen 
and stored at -80°C. Unpollinated stigmas were iso- 
lated from flowers just before anthesis, frozen in liquid 
nitrogen and stored at -80°C. Total RNA was extracted 
from each sample using the RNeasy Plant Mini Kit fol- 
lowing the manufactures instructions (Qiagen, Valencia, 
CA, USA ), and the RNA integrity was measured with 
a RNA 6000 Nano Labchip™ on the Agilent 2100 
Bioanalyzer (Agilent Technologies, Santa Clara, CA, 
USA). Samples were sequenced on an Illumina HiSeq2000 
system. 
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Read quality and trimming of sequences 

Using the program FastQC (Babraham Institute, CA, 
USA) we were able to visualize the read quality and length 
of the Illumina raw reads. Using the output from this pro- 
gram we determined that Illumina adaptors were present 
at the 3' end of the reads. It also indicated to us that the 
paired-end reads were overlapping. Using this to our ad- 
vantage, we used the program fastq-join.pl [http://code. 
google.com/p/ea-utils] to merge reads with an overlap of 
10 bp after removing the Illumina adaptors on the 3' end 
of the read using the program Homer-Tools [69] . 

Transcriptome analyses 

Illumina 101 bp reads from reproductive tissue samples 
were used for gene expression analysis of the 39 protein- 
coding genes. Reads were mapped onto the sequences of 
the 39 genes using Bowtie [70], allowing a maximum of 
2 mismatches in the first 25 bp. The program RSEM 
[71] was used to calculate RNA-seq read abundance 
from the SAM alignments. The expression of each gene 
was calculated by dividing the abundance estimates from 
RSEM by the length of the gene (kbp). 
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